Task-oriented dialogue (TOD) systems are mainly based on the slot-filling-based TOD (SF-TOD) framework, in which dialogues are broken down into smaller, controllable units (i.e., slots) to fulfill a specific task. A series of approaches based on this framework achieved remarkable success on various TOD benchmarks. However, we argue that the current TOD benchmarks are limited to surrogate real-world scenarios and that the current TOD models are still a long way from unraveling the scenarios. In this position paper, we first identify current status and limitations of SF-TOD systems. After that, we explore the WebTOD framework, the alternative direction for building a scalable TOD system when a web/mobile interface is available. In WebTOD, the dialogue system learns how to understand the web/mobile interface that the human agent interacts with, powered by a large-scale language model.
translated by 谷歌翻译
多年来,2d Gans在影像肖像的一代中取得了巨大的成功。但是,他们在生成过程中缺乏3D理解,因此他们遇到了多视图不一致问题。为了减轻这个问题,已经提出了许多3D感知的甘斯,并显示出显着的结果,但是3D GAN在编辑语义属性方面努力。 3D GAN的可控性和解释性并未得到太多探索。在这项工作中,我们提出了两种解决方案,以克服2D GAN和3D感知gan的这些弱点。我们首先介绍了一种新颖的3D感知gan,Surf-Gan,它能够在训练过程中发现语义属性,并以无监督的方式控制它们。之后,我们将先验的Surf-GAN注入stylegan,以获得高保真3D控制的发电机。与允许隐姿姿势控制的现有基于潜在的方法不同,所提出的3D控制样式gan可实现明确的姿势控制对肖像生成的控制。这种蒸馏允许3D控制与许多基于样式的技术(例如,反转和风格化)之间的直接兼容性,并且在计算资源方面也带来了优势。我们的代码可从https://github.com/jgkwak95/surf-gan获得。
translated by 谷歌翻译
translated by 谷歌翻译
恶劣的天气图像翻译属于无监督的图像到图像(I2i)翻译任务,旨在将不利条件领域(例如,雨夜)转移到标准领域(例如,日期)。这是一个具有挑战性的任务,因为来自不利域的图像具有一些伪影和信息不足。最近,许多采用生成的对抗性网络(GANS)的研究在I2I翻译中取得了显着的成功,但仍然有限制将它们应用于恶劣天气增强。基于双向循环 - 一致性损耗的对称架构被采用作为无监督域传输方法的标准框架。但是,如果两个域具有不平衡信息,它可能会导致较差的转换结果。为了解决这个问题,我们提出了一种新的GaN模型,即Au-GaN,它具有不对称的域翻译的非对称架构。我们仅在普通域生成器(即雨夜 - >日)中插入建议的功能传输网络($ {T} $ - 网),以增强不利域图像的编码特征。此外,我们介绍了对编码特征的解剖学的非对称特征匹配。最后,我们提出了不确定感知的周期 - 一致性损失,以解决循环重建图像的区域不确定性。我们通过与最先进的模型进行定性和定量比较来证明我们的方法的有效性。代码在https://github.com/jgkwak95/au-g中提供。
translated by 谷歌翻译
Motivation: Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements in NLP to biomedical text mining often yields unsatisfactory results due to a word distribution shift from general domain corpora to biomedical corpora. In this article, we investigate how the recently introduced pre-trained language model BERT can be adapted for biomedical corpora. Results: We introduce BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora. With almost the same architecture across tasks, BioBERT largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre-trained on biomedical corpora. While BERT obtains performance comparable to that of previous state-of-the-art models, BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (0.62% F1 score improvement), biomedical relation extraction (2.80% F1 score improvement) and biomedical question answering (12.24% MRR improvement). Our analysis results show that pre-training BERT on biomedical corpora helps it to understand complex biomedical texts.
translated by 谷歌翻译
translated by 谷歌翻译
我们解决了人搜索的任务,即从一组原始场景图像中进行本地化和重新识别查询人员。最近的方法通常是基于Oimnet(在人搜索上的先驱工作)建立的,该作品学习了执行检测和人重新识别(REID)任务的联合人物代表。为了获得表示形式,它们从行人提案中提取特征,然后将其投射到具有L2归一化的单位超晶体上。这些方法还结合了所有积极的建议,这些建议与地面真理充分重叠,同样可以学习REID的人代表。我们发现1)L2归一化而不考虑特征分布会退化人的判别能力,而2)正面建议通常也描绘了背景混乱和人的重叠,这可能会将嘈杂的特征编码为人的表示。在本文中,我们介绍了解决上述局限性的Oimnet ++。为此,我们引入了一个新颖的归一化层,称为Protonorm,该层校准了行人建议的特征,同时考虑了人ID的长尾分布,使L2归一化的人表示具有歧视性。我们还提出了一种本地化感知的特征学习计划,该方案鼓励更好地调整的建议在学习歧视性表示方面做出更多的贡献。对标准人员搜索基准的实验结果和分析证明了Oimnet ++的有效性。
translated by 谷歌翻译
The 3D-aware image synthesis focuses on conserving spatial consistency besides generating high-resolution images with fine details. Recently, Neural Radiance Field (NeRF) has been introduced for synthesizing novel views with low computational cost and superior performance. While several works investigate a generative NeRF and show remarkable achievement, they cannot handle conditional and continuous feature manipulation in the generation procedure. In this work, we introduce a novel model, called Class-Continuous Conditional Generative NeRF ($\text{C}^{3}$G-NeRF), which can synthesize conditionally manipulated photorealistic 3D-consistent images by projecting conditional features to the generator and the discriminator. The proposed $\text{C}^{3}$G-NeRF is evaluated with three image datasets, AFHQ, CelebA, and Cars. As a result, our model shows strong 3D-consistency with fine details and smooth interpolation in conditional feature manipulation. For instance, $\text{C}^{3}$G-NeRF exhibits a Fr\'echet Inception Distance (FID) of 7.64 in 3D-aware face image synthesis with a $\text{128}^{2}$ resolution. Additionally, we provide FIDs of generated 3D-aware images of each class of the datasets as it is possible to synthesize class-conditional images with $\text{C}^{3}$G-NeRF.
translated by 谷歌翻译
In both terrestrial and marine ecology, physical tagging is a frequently used method to study population dynamics and behavior. However, such tagging techniques are increasingly being replaced by individual re-identification using image analysis. This paper introduces a contrastive learning-based model for identifying individuals. The model uses the first parts of the Inception v3 network, supported by a projection head, and we use contrastive learning to find similar or dissimilar image pairs from a collection of uniform photographs. We apply this technique for corkwing wrasse, Symphodus melops, an ecologically and commercially important fish species. Photos are taken during repeated catches of the same individuals from a wild population, where the intervals between individual sightings might range from a few days to several years. Our model achieves a one-shot accuracy of 0.35, a 5-shot accuracy of 0.56, and a 100-shot accuracy of 0.88, on our dataset.
translated by 谷歌翻译
Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning (RL), but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality and outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.
translated by 谷歌翻译